Advances in the Exon-Intron Database (EID)

نویسندگان

  • Valery Shepelev
  • Alexei Fedorov
چکیده

Investigation of exon-intron gene structures is a non-trivial task due to enormous expansions of the eukaryotic genomes, great variety of gene forms, and the imperfectness in sequence data. A number of available informational systems on various gene characteristics complement each other and are indispensable for many genomic studies. Among them, the Exon-Intron Database (EID) is a good choice for large-scale computational examination of exon/intron structure and splicing. It has many internal filters that control for sequence quality, consistency of gene descriptions, accordance to standards, and possible errors. New innovations in EID are described. The collection of exons and introns has been extended beyond coding regions and current versions of EID contain data on untranslated regions of gene sequences as well. Intron-less genes are included as a special part of EID. For species with entirely sequenced genomes, species-specific databases have been generated. A novel Mammalian Orthologous Intron Database (MOID) has been introduced which includes the full set of introns that come from orthologous genes that have the same positions relative to the reading frames. Examples of statistical analyses of gene sequences using EID are provided. We present the latest data on our comparison of intron positions in 11,025 orthologous genes of human, mouse and rat, and find no convincing cases of intron gain. We discuss relevant data-quality issues of genomic databases. In particular, 5% of genes in genomic databases contain internal stop codons. This fact is due to a combination of biological reasons and also to errors in sequence annotations. The EID is freely available at www.meduohio.edu/bioinfo/eid/.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

EID: the Exon?Intron Database?an exhaustive database of protein-coding intron-containing genes

To aid studies of molecular evolution and to assist in gene prediction research, we have constructed an Exon-Intron Database (EID) in FASTA format. Currently, the database is derived from GenBank release 112, and it contains 51 289 protein-coding genes (287 209 exons) that harbor introns, along with extensive descriptions of each gene and its DNA and protein sequences, as well as splice motif i...

متن کامل

Bioinformatic analysis of exon repetition, exon scrambling and trans-splicing in humans

MOTIVATION Using bioinformatic approaches we aimed to characterize poorly understood abnormalities in splicing known as exon scrambling, exon repetition and trans-splicing. RESULTS We developed a software package that allows large-scale comparison of all human expressed sequence tags (EST) sequences to the entire set of human gene sequences. Among 5,992,495 EST sequences, 401 cases of exon re...

متن کامل

Short nucleotide sequences signal spliceosomal binding in nucleic acids.

We have explored the region around the splice sites of the human intron and exons from the exon-intron database (EID) and located a number of short 6-nucleotide and 7-nucleotide sequences that are relatively common in the regions. These short sequences, we expect play an important role in the selection of the appropriate splicing process. We propose that the external signals via short recogniti...

متن کامل

Novel Single Nucleotide Polymorphisms (SNPs) in Intron 2 and Exon 3 Regions of Leptin Gene in Sumba Ongole Cattle

The bovine leptin (LEP) gene was widely used as a candidate gene for molecular selection to improve productivity traits of cattle. This study was carried out to identify single nucleotide polymorphisms (SNPs) in the LEP gene of Sumba Ongole (SO, Bos indicus) cows using sequencing method. A total of 31 animals were used in this study for analyses. Research showed that total of 16 SNPs w...

متن کامل

Loss of Chloroplast trnLUAA Intron in Two Species of Hedysarum (Fabaceae): Evolutionary Implications

Previous studies have indicated that in all land plants examined to date, the chloroplast gene trnLUAA isinterrupted by a single group I intron ranging from 250 to over 1400 bp. The parasitic Epifagus virginiana haslost, however, the entire gene. We report that the intron is missing from the chloroplast genome of twoarctic species of the legume genus Hedysarum (H. alpinum, H. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Briefings in bioinformatics

دوره 7 2  شماره 

صفحات  -

تاریخ انتشار 2006